In this tutorial we will introduce the packages required for geospatial work in R, show how to read in a few spatial data types, demonstrate several ways to calculate landscape metrics, and briefly touch on spatial statistics. It is assumed that you have R and RStudio installed and that you, at a minimum, understand the basic concepts of the R language (e.g. you can work throughR For Cats).
Within in the last several years there has been a lot of effort spent on adding spatial data handling and analysis capability to R. Thanks to the very significant effort of the package authors we now have the foundation for doing GIS entirely within R.
The four packages that provide this foundation are:
Let’s dig in a bit deeper on each of these.
The sp package provides the primary spatial data structures for use in R. Many other packages assume your data is stored as one of the sp data structures. Getting into the details of these is beyond the scope of this workshop, but look at the introduction to sp vignette for more details. That being said, we will be working mostly with SpatialPointsDataFrame and SpatialPolygonsDataFrame. More on that later.
Getting sp added is no different than adding any other package that is on CRAN.
install.packages("sp")
library("sp")
The rgdal package provides tools for reading and writing multiple spatial data formats. It is based on the Geospatial Data Abstraction Library (GDAL) which is a project of the Open Source Geospatial Foundation (OSGeo). The primary use of rgdal is to read various spatial data formats into R and store them as sp objects. In this workshop, we will be only using rgdal to read in shape files, but it has utility far beyond that.
As before, nothing special to get set up with rgdal on windows. Simply:
install.packages("rgdal")
library("rgdal")
Getting set up on Linux or Mac requires more effort (i.e. need to have GDAL installed).
Although sp and rgdal provide raster data capabilities, they do require that the full raster dataset be read into memory. This can have some performance implications as well as limits the size of datasets you can readily work with. The raster package works around this by working with raster data on the disk. That too has some performance implications, but for the most part, in my opinion anyway, raster makes it easier to work with raster data. It also provides several additional functions for analyzing raster data.
To install, just do:
install.packages("raster")
library("raster")
The last of the core packages for doing GIS in R is rgeos. Like rgdal, rgeos is a project of OSgeo. It is a wrapper around the Geometry Engine Open Source c++ library and provides a suite of tools for conducting vector GIS analyses.
To install
install.packages("rgeos")
library("rgeos")
For Linux and Mac the GEOS library will also need to be available. As with rgdal this is a bit beyond the scope of this tutorial.
Things are changing quickly in the R/spatial analysis world and the most fundamental change is via the sf package. This package aims to replace sp, rgdal, and rgeos. There are a lot of reasons why this is a good thing, but that is a bit beyond the scope of this tutorial. Suffice it to say it should make things faster and simpler!
To install sf:
install.packages("sf")
library("sf")
The first exercise won’t be too thrilling, but we need to make sure everyone has the packages installed.
1.) Install sp and load sp into your library. 2.) Repeat, with sf,rgdal, raster, and rgeos.
Although we won’t be working with external GIS in this tutorial, there are several packages that provide ways to move back and forth from another GIS and R. For your reference, here are some.
spgrass6, but for the latest version of GRASS, GRASS 7.There are many packages for doing various types of spatial analysis in R. For this tutorial we will look at only a few
This package is a large suite of tools for species distribution modelling. As part of that suite, the FRAGSTATS metrics are inlcuded.
This is a huge package and will take care of pretty much all of your point pattern analysis needs.
Also another huge package for spatial analysis. This one is dense, but has all of your autocorrelation/variogram functions!
SDMTools, spatstat, and spdep.Reading in data with sp requires rgdal which supports many different data formats. For this quick tutorial lets see an example of reading in a shapefile in the current working directory.
rand <- readOGR(".","rand")
## OGR data source with driver: ESRI Shapefile
## Source: ".", layer: "rand"
## with 100 features
## It has 1 fields
plot(rand)
To read in raster datasets we can use the raster packages raster() function. For most raster formats it is simply a matter of telling raster() to file location.
nlcd <- raster("nlcd.tif")
plot(nlcd)
Lastly, reading in data with sf is also pretty straightforward.
sf_rand <- st_read("rand.shp")
## Reading layer `rand' from data source `/data/jhollist/r_landscape_tutorial/rand.shp' using driver `ESRI Shapefile'
## Simple feature collection with 100 features and 1 field
## geometry type: POINT
## dimension: XY
## bbox: xmin: -73.41471 ymin: 41.51605 xmax: -72.57539 ymax: 42.41291
## epsg (SRID): 4326
## proj4string: +proj=longlat +datum=WGS84 +no_defs
One of the benefits of using sf is the speed. In my tests it is about twice as fast. Let’s look at a biggish shape file with 1 million points!
1 million points
#The old way
system.time(readOGR(".","big"))
## OGR data source with driver: ESRI Shapefile
## Source: ".", layer: "big"
## with 1000000 features
## It has 1 fields
## user system elapsed
## 11.565 0.225 11.797
#The sf way
system.time(st_read("big.shp"))
## Reading layer `big' from data source `/data/jhollist/r_landscape_tutorial/big.shp' using driver `ESRI Shapefile'
## Simple feature collection with 1000000 features and 1 field
## geometry type: POINT
## dimension: XY
## bbox: xmin: -71.03768 ymin: 41.05976 xmax: -69.09763 ymax: 43.00856
## epsg (SRID): 4326
## proj4string: +proj=longlat +datum=WGS84 +no_defs
## user system elapsed
## 5.938 0.101 6.043
readOGR(). These are unif.shp and clumped.shp. Name them unif and clumped, respectively.Landscape metrics is a pretty big topic and a HUGE number of metrics can be calculated. There are of course many ways to do that, but, in my experience, the most common has been to use FRAGSTATS, a stand alone program that has seen many iterations since it was first released as a series of AML amd C (I’m dating myself) tools back in 1994. A stand-alone R implementation has not been created but most of the metrics have been included in the SDMTools package. Not sure how much active development is going on with SDMTools (it was last updated in August of 2014) but the code should be relatively stable. The primary purpose for SDMTools is species distribution modelling, but it does have two functions for caluclating landscape metrics: PatchStat() and ClassStat().
Before we get started with those, make sure SDMTools is installed and loaded/
install.packages("SDMTools")
library("SDMTools")
Let’s start with the class level landscape metrics. For this all we need to do is point to the land cover data (it is expecting a raster and supports multiple data structures).
nlcd_class_metrics <- ClassStat(mat = nlcd, cellsize = 30)
dplyr::tbl_df(nlcd_class_metrics)
## # A tibble: 15 × 38
## class n.patches total.area prop.landscape patch.density total.edge
## * <dbl> <int> <dbl> <dbl> <dbl> <dbl>
## 1 11 764 81593100 0.027556597 2.580272e-07 1235460
## 2 21 11461 191439000 0.064655068 3.870746e-06 12137340
## 3 22 14413 84519000 0.028544767 4.867731e-06 6488400
## 4 23 6765 36191700 0.012223094 2.284757e-06 2674140
## 5 24 1578 8551800 0.002888216 5.329410e-07 542460
## 6 31 272 7546500 0.002548694 9.186309e-08 264900
## 7 41 4610 1650803400 0.557529058 1.556944e-06 27957420
## 8 42 7639 180539100 0.060973823 2.579934e-06 8413320
## 9 43 9895 310977900 0.105027174 3.341858e-06 14941260
## 10 52 2805 30772800 0.010392958 9.473381e-07 1951800
## 11 71 683 10901700 0.003681852 2.306709e-07 533820
## 12 81 2632 147875400 0.049942248 8.889105e-07 4828560
## 13 82 404 19965600 0.006743021 1.364437e-07 626520
## 14 90 5159 184133700 0.062187834 1.742359e-06 6657780
## 15 95 993 15117300 0.005105595 3.353678e-07 776760
## # ... with 32 more variables: edge.density <dbl>,
## # landscape.shape.index <dbl>, largest.patch.index <dbl>,
## # mean.patch.area <dbl>, sd.patch.area <dbl>, min.patch.area <dbl>,
## # max.patch.area <dbl>, perimeter.area.frac.dim <dbl>,
## # mean.perim.area.ratio <dbl>, sd.perim.area.ratio <dbl>,
## # min.perim.area.ratio <dbl>, max.perim.area.ratio <dbl>,
## # mean.shape.index <dbl>, sd.shape.index <dbl>, min.shape.index <dbl>,
## # max.shape.index <dbl>, mean.frac.dim.index <dbl>,
## # sd.frac.dim.index <dbl>, min.frac.dim.index <dbl>,
## # max.frac.dim.index <dbl>, total.core.area <dbl>,
## # prop.landscape.core <dbl>, mean.patch.core.area <dbl>,
## # sd.patch.core.area <dbl>, min.patch.core.area <dbl>,
## # max.patch.core.area <dbl>, prop.like.adjacencies <dbl>,
## # aggregation.index <dbl>, lanscape.division.index <dbl>,
## # splitting.index <dbl>, effective.mesh.size <dbl>,
## # patch.cohesion.index <dbl>
Calculating the patch metrics is a bit more involved becuase it is looking for a binary raster. So if we want to look at the metrics for all forest patches we need to pull those out of our landcover. Raster works well with logical statements and will return the cells that match as another raster. We can string those together with soem “or’s” to get all forest patches. Additionally, we want to get each individual patch labeled. We can so that with the SDMTools function ConnCompLabel().
nlcd_forest <- ConnCompLabel(nlcd==41|nlcd==42|nlcd==43)
plot(nlcd_forest)
Then to calculate the individual patch metrics.
nlcd_forest_patch_metrics <- PatchStat(mat = nlcd_forest, cellsize = 30)
dplyr::tbl_df(nlcd_forest_patch_metrics)
## # A tibble: 1,785 × 12
## patchID n.cell n.core.cell n.edges.perimeter n.edges.internal area
## <int> <int> <int> <int> <int> <dbl>
## 1 0 909564 451960 624262 3013994 818607600
## 2 1 4965 3816 1266 18594 4468500
## 3 2 49782 39889 10636 188492 44803800
## 4 3 39939 31968 8612 151144 35945100
## 5 4 530 335 214 1906 477000
## 6 5 30686 27456 3458 119286 27617400
## 7 6 35 11 30 110 31500
## 8 7 2 0 6 2 1800
## 9 8 887 410 580 2968 798300
## 10 9 1 0 4 0 900
## # ... with 1,775 more rows, and 6 more variables: core.area <dbl>,
## # perimeter <dbl>, perim.area.ratio <dbl>, shape.index <dbl>,
## # frac.dim.index <dbl>, core.area.index <dbl>
And another example for water.
nlcd_water <- ConnCompLabel(nlcd==11)
plot(nlcd_water)
Then the metrics.
nlcd_water_patch_metrics <- PatchStat(mat = nlcd_water, cellsize = 30)
dplyr::tbl_df(nlcd_water_patch_metrics)
## # A tibble: 765 × 12
## patchID n.cell n.core.cell n.edges.perimeter n.edges.internal
## <int> <int> <int> <int> <int>
## 1 0 3199261 3149797 48188 12748856
## 2 1 5 0 12 8
## 3 2 11 0 18 26
## 4 3 4 0 10 6
## 5 4 1 0 4 0
## 6 5 3 0 8 4
## 7 6 10 0 16 24
## 8 7 11 0 20 24
## 9 8 16 2 20 44
## 10 9 52 19 42 166
## # ... with 755 more rows, and 7 more variables: area <dbl>,
## # core.area <dbl>, perimeter <dbl>, perim.area.ratio <dbl>,
## # shape.index <dbl>, frac.dim.index <dbl>, core.area.index <dbl>
While there are lots of things included in the SDMTools functions there are also many other types of metrics that may not be included. In that case you might want to roll your own metrics. At this point, that is beyond the scope of this tutorial. Check back in the future!
The second big section on landscape ecology data analyis relates to spatial statistics. In this tutorial we will cover point pattern analysis, autocorrelation and variography, and spatial interpolation
Point pattern analysis is supported by many packages but the one that seems to have the most complete functionality (based on my VERY limited experience working with point pattern in R) is spatstat. If you are looking for a more complete review of point pattern analysis in R, the FWS has a good tutorial.
One issue we will need to deal with is that many of the packages for spatial analysis preceeded the formal definitions of spatial data in R (e.g. sp, raster, and sf) thus they have their own data formats. We will need to manipulate the data that we have read into R and create a ppp object. Below shows examples for the sp and sf objects.
First lets convert one of the sp objects. The package maptools currently provides many of these conversions via the as functions for sp objects.
library(maptools)
#coordinates(rand) = ~x+y
rand_ppp <- as(rand,"ppp")
plot(rand_ppp)
Since sf is so new their aren’t many of these conversion functions yet available. We can roll our own and create the ppp object.
sf_rand_coords <- matrix(unlist(sf_rand$geometry), ncol = 2, byrow = T)
sf_rand_owin <- owin(st_bbox(sf_rand)[c(1,3)],st_bbox(sf_rand)[c(2,4)])
sf_rand_ppp <- ppp(x = sf_rand_coords[,1], y = sf_rand_coords[,2],
window = sf_rand_owin, check = T)
plot(sf_rand_ppp)
The hardest part of this is to make the conversion of our data structures. Now that we have the ppp object we can start to analyze the data.
First, some exploratory visualization. Nice, but I need to think about what this is showing us…
contour(density(rand_ppp))
We can work on a count of points per quadrats
rand_qc <- quadratcount(sf_rand_ppp, 3,3)
plot(rand_qc)
plot(sf_rand_ppp, add = T, cex = 0.5, col = "grey")
And lastly, we can look at the Ripley’s K plot as it was presented to us in the Turner and Gardner book.
rand_pp_k <- Kest(rand_ppp)
plot(rand_pp_k)
sp objects, unif and clumped, into ppp objects.quadratcount object for both. Use a 3X3 quadrat design.#From the spdeb help
library(spdep)
## Loading required package: Matrix
data(oldcol)
oldcrime.lm <- lm(CRIME ~ HOVAL + INC, data = COL.OLD)
lm.morantest(oldcrime.lm,nb2listw(COL.nb, style = "W"))
##
## Global Moran I for regression residuals
##
## data:
## model: lm(formula = CRIME ~ HOVAL + INC, data = COL.OLD)
## weights: nb2listw(COL.nb, style = "W")
##
## Moran I statistic standard deviate = 2.9539, p-value = 0.001569
## alternative hypothesis: greater
## sample estimates:
## Observed Moran I Expectation Variance
## 0.235638354 -0.033302866 0.008289408
Moran(nlcd)